ARABASE: A Relational Database for Arabic OCR Systems
نویسندگان
چکیده
In this paper we present a database for the research of Arabic off-line and on-line handwriting optical recognition as well as for machine printed text optical recognition. Digital images of documents, text phrases, words/sub-words, isolated characters, digits, signatures, soon are and included in ARABASE. Data corresponds to a variety of lexes (cities names, literal amounts, isolated characters, digits, free texts, etc.). The database organization offers interesting commodities to be explored via an Arabic writing recognition system. A useful tool enables the user, via a graphical interface to experiment different classical tasks of image processing.
منابع مشابه
Probabilistic Management of OCR Data using an RDBMS
The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-of-the-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to store the resulting ASCII text in a relational database. The OCR problem is challenging, and so t...
متن کاملA Database for Arabic Printed Character Recognition
Electronic Document Management (EDM) technology is being widely adopted as it makes for the efficient routing and retrieval of documents. Optical Character Recognition (OCR) is an important front end for such technology. Excellent OCR now exists for Latin based languages, but there are few systems that read Arabic, which limits the penetration of EDM into Arabicspeaking countries. In developing...
متن کاملA Survey of Robust hybrid approach for Arabic character recognition
In this paper we present a system of Arabic characters recognition dedicated to the automatic reading of ACR (Arabic Character Recognition). The developed system is a Fuzzy classifier: Fuzzy Logic (FL) combined with the Expert System (ES) to extract the topological and the contextual informations of each Print character. This combination is very useful to improve the powerful of Hybrid Intellig...
متن کاملA Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research
This paper presents a new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research. The database is freely available for academic use. So far no such a freely database in Farsi language is available. Grayscale images of 52,380 characters and 17,740 numerals are included. Each image was scanned from Iranian scho...
متن کاملArabase - A Database Combining Different Arabic Resources with Lexical and Semantic Information
Language resources are important factor in any NLP application. However, the language resource support for Arabic is poor because the existing Arabic language resources are either scattered, inconsistent or even incomplete. In this paper we discuss the notion of having an integrated Arabic resource leveraging various pre-existing ones. We present a comparison between these resources then we pre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. Arab J. Inf. Technol.
دوره 2 شماره
صفحات -
تاریخ انتشار 2005